Use AWS Neuron SDK 2.26 #977

dacorvo · 2025-09-26T14:29:55Z

What does this PR do?

This bumps the AWS Neuron SDK version to 2.26.

This also bumps the torch version to 2.8, which in turns leads to vLLM to be updated to 0.10.2 (the first version supporting pytorch 2.8).

There are some remaining errors in:

training tests.

FAILED tests/training/test_custom_modeling.py::test_custom_model_tie_weights - Failed: Test failed with SafetensorError: Error while deserializing header: incomplete metadata, file not fully covered

diffusers test
Flux test hangs

HuggingFaceDocBuilderDev · 2025-09-30T06:44:35Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Note that the NEURON platform is now deprecated.

Since fixtures are always included, avoid import errors when specific packages required by a fixture are not available.

.github/workflows/doc-pr-build.yml

tengomucho · 2025-10-01T13:25:13Z

.github/workflows/test_inf2_transformers.yml

        uses: ./.github/actions/prepare_venv
      - name: Install optimum-neuron
        uses: ./.github/actions/install_optimum_neuron
+      - name: Install datasets dependencies


Perhaps rephrase it to "Install audio tests dependencies"

This avoids importing docker and openai for all tests

Note that github variables used for inputs can only be of type string. This is why the 'use_cuda' variable is not a boolean. Being able to configure the pytorch installation allows a specific workflow to install a specific torch version, or to use CUDA (some packages are not compatible with pytorch CPU version).

neuronx-distributed is always required.

Some tests are failing with compel>=2.2.0

These tests hang with AWS Neuron SDK 2.26

tengomucho

LGTM

tengomucho force-pushed the neuron_sdk_2.26 branch from 8dd321a to 8bbe1f4 Compare September 29, 2025 19:08

dacorvo force-pushed the neuron_sdk_2.26 branch from 8bbe1f4 to 83956bd Compare September 30, 2025 06:39

dacorvo force-pushed the neuron_sdk_2.26 branch 6 times, most recently from 4d3ca92 to 111d3ab Compare September 30, 2025 12:30

dacorvo added 10 commits October 1, 2025 07:50

chore: bump AWS Neuron SDK version

2b78cdf

chore: bump dev version

55345f1

chore(vllm): use AWS Neuron SDK 2.26

ef8d361

chore(vllm): bump version to 0.10.2 to support pytorch 2.8

3851949

Note that the NEURON platform is now deprecated.

ci: use AWS Neuron SDK 2.26 system components

8ca0d5b

chore(ami): use AWS Neuron SDK 2.26 AMI as base

24f54ba

ci: use pytorch 2.8.0

aa12b77

refactor(utils): use common import check method

e51b9d9

tests(llm): avoid import errors

de265ae

Since fixtures are always included, avoid import errors when specific packages required by a fixture are not available.

tests(vllm): skip tests if prerequisites are not met

ebafe1a

dacorvo force-pushed the neuron_sdk_2.26 branch 2 times, most recently from 9a0fd7c to f9e78c5 Compare October 1, 2025 10:06

dacorvo marked this pull request as ready for review October 1, 2025 10:07

dacorvo requested review from JingyaHuang, michaelbenayoun and tengomucho October 1, 2025 10:07

tengomucho approved these changes Oct 1, 2025

View reviewed changes

dacorvo force-pushed the neuron_sdk_2.26 branch 2 times, most recently from 4e20238 to 8a9ac50 Compare October 1, 2025 13:48

dacorvo added 2 commits October 1, 2025 14:04

ci(doc): bump node version

9d990e7

chore: vllm-tests import group

9c07eca

This avoids importing docker and openai for all tests

dacorvo added 3 commits October 1, 2025 14:04

chore(training): remove redundant dependency

f4980e9

neuronx-distributed is always required.

test(audio): only install prerequisites when required

5b96a42

dacorvo force-pushed the neuron_sdk_2.26 branch 5 times, most recently from a34b5bb to 1238def Compare October 1, 2025 14:20

ci(setup): display neuron driver info

a216486

dacorvo force-pushed the neuron_sdk_2.26 branch 3 times, most recently from c098698 to aca8f74 Compare October 2, 2025 11:03

dacorvo added 2 commits October 2, 2025 11:46

fix(diffusers): pin compel version

efc7369

Some tests are failing with compel>=2.2.0

test(diffusers): disable flux tests

930bf85

These tests hang with AWS Neuron SDK 2.26

dacorvo force-pushed the neuron_sdk_2.26 branch from aca8f74 to 930bf85 Compare October 2, 2025 11:51

tengomucho approved these changes Oct 2, 2025

View reviewed changes

dacorvo merged commit 5a01d92 into main Oct 2, 2025
12 checks passed

dacorvo deleted the neuron_sdk_2.26 branch October 2, 2025 13:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Use AWS Neuron SDK 2.26 #977

Use AWS Neuron SDK 2.26 #977

Uh oh!

dacorvo commented Sep 26, 2025 •

edited

Loading

Uh oh!

HuggingFaceDocBuilderDev commented Sep 30, 2025

Uh oh!

Uh oh!

tengomucho Oct 1, 2025

Uh oh!

tengomucho left a comment

Uh oh!

Uh oh!

Uh oh!

Use AWS Neuron SDK 2.26 #977

Use AWS Neuron SDK 2.26 #977

Uh oh!

Conversation

dacorvo commented Sep 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Uh oh!

HuggingFaceDocBuilderDev commented Sep 30, 2025

Uh oh!

Uh oh!

tengomucho Oct 1, 2025

Choose a reason for hiding this comment

Uh oh!

tengomucho left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

dacorvo commented Sep 26, 2025 •

edited

Loading